In this paper we explore the task of modeling (semi) structured object sequences; in particular we focus our attention on the problem of developing a structure-aware input representation for such sequences. In such sequences, we assume that each structured object is represented by a set of key-value pairs which encode the attributes of the structured object. Given a universe of keys, a sequence of structured objects can then be viewed as an evolution of the values for each key, over time. We encode and construct a sequential representation using the values for a particular key (Temporal Value Modeling - TVM) and then self-attend over the set of key-conditioned value sequences to a create a representation of the structured object sequence (Key Aggregation - KA). We pre-train and fine-tune the two components independently and present an innovative training schedule that interleaves the training of both modules with shared attention heads. We find that this iterative two part-training results in better performance than a unified network with hierarchical encoding as well as over, other methods that use a {\em record-view} representation of the sequence \cite{de2021transformers4rec} or a simple {\em flattened} representation of the sequence. We conduct experiments using real-world data to demonstrate the advantage of interleaving TVM-KA on multiple tasks and detailed ablation studies motivating our modeling choices. We find that our approach performs better than flattening sequence objects and also allows us to operate on significantly larger sequences than existing methods.
translated by 谷歌翻译
Dense retrievers have made significant strides in obtaining state-of-the-art results on text retrieval and open-domain question answering (ODQA). Yet most of these achievements were made possible with the help of large annotated datasets, unsupervised learning for dense retrieval models remains an open problem. In this work, we explore two categories of methods for creating pseudo query-document pairs, named query extraction (QExt) and transferred query generation (TQGen), to augment the retriever training in an annotation-free and scalable manner. Specifically, QExt extracts pseudo queries by document structures or selecting salient random spans, and TQGen utilizes generation models trained for other NLP tasks (e.g., summarization) to produce pseudo queries. Extensive experiments show that dense retrievers trained with individual augmentation methods can perform comparably well with multiple strong baselines, and combining them leads to further improvements, achieving state-of-the-art performance of unsupervised dense retrieval on both BEIR and ODQA datasets.
translated by 谷歌翻译
Continuous-time Markov chains are used to model stochastic systems where transitions can occur at irregular times, e.g., birth-death processes, chemical reaction networks, population dynamics, and gene regulatory networks. We develop a method to learn a continuous-time Markov chain's transition rate functions from fully observed time series. In contrast with existing methods, our method allows for transition rates to depend nonlinearly on both state variables and external covariates. The Gillespie algorithm is used to generate trajectories of stochastic systems where propensity functions (reaction rates) are known. Our method can be viewed as the inverse: given trajectories of a stochastic reaction network, we generate estimates of the propensity functions. While previous methods used linear or log-linear methods to link transition rates to covariates, we use neural networks, increasing the capacity and potential accuracy of learned models. In the chemical context, this enables the method to learn propensity functions from non-mass-action kinetics. We test our method with synthetic data generated from a variety of systems with known transition rates. We show that our method learns these transition rates with considerably more accuracy than log-linear methods, in terms of mean absolute error between ground truth and predicted transition rates. We also demonstrate an application of our methods to open-loop control of a continuous-time Markov chain.
translated by 谷歌翻译
This paper focuses on a stochastic system identification problem: given time series observations of a stochastic differential equation (SDE) driven by L\'{e}vy $\alpha$-stable noise, estimate the SDE's drift field. For $\alpha$ in the interval $[1,2)$, the noise is heavy-tailed, leading to computational difficulties for methods that compute transition densities and/or likelihoods in physical space. We propose a Fourier space approach that centers on computing time-dependent characteristic functions, i.e., Fourier transforms of time-dependent densities. Parameterizing the unknown drift field using Fourier series, we formulate a loss consisting of the squared error between predicted and empirical characteristic functions. We minimize this loss with gradients computed via the adjoint method. For a variety of one- and two-dimensional problems, we demonstrate that this method is capable of learning drift fields in qualitative and/or quantitative agreement with ground truth fields.
translated by 谷歌翻译
Predicting emotions expressed in text is a well-studied problem in the NLP community. Recently there has been active research in extracting the cause of an emotion expressed in text. Most of the previous work has done causal emotion entailment in documents. In this work, we propose neural models to extract emotion cause span and entailment in conversations. For learning such models, we use RECCON dataset, which is annotated with cause spans at the utterance level. In particular, we propose MuTEC, an end-to-end Multi-Task learning framework for extracting emotions, emotion cause, and entailment in conversations. This is in contrast to existing baseline models that use ground truth emotions to extract the cause. MuTEC performs better than the baselines for most of the data folds provided in the dataset.
translated by 谷歌翻译
A vast amount of expert and domain knowledge is captured by causal structural priors, yet there has been little research on testing such priors for generalization and data synthesis purposes. We propose a novel model architecture, Causal Structural Hypothesis Testing, that can use nonparametric, structural causal knowledge and approximate a causal model's functional relationships using deep neural networks. We use these architectures for comparing structural priors, akin to hypothesis testing, using a deliberate (non-random) split of training and testing data. Extensive simulations demonstrate the effectiveness of out-of-distribution generalization error as a proxy for causal structural prior hypothesis testing and offers a statistical baseline for interpreting results. We show that the variational version of the architecture, Causal Structural Variational Hypothesis Testing can improve performance in low SNR regimes. Due to the simplicity and low parameter count of the models, practitioners can test and compare structural prior hypotheses on small dataset and use the priors with the best generalization capacity to synthesize much larger, causally-informed datasets. Finally, we validate our methods on a synthetic pendulum dataset, and show a use-case on a real-world trauma surgery ground-level falls dataset.
translated by 谷歌翻译
SKA脉冲星搜索管道将用于实时检测脉冲星。SKA等现代射电望远镜将在其全面运行中生成数据。因此,基于经验和数据驱动的算法对于诸如候选检测等应用是必不可少的。在这里,我们描述了我们的发现,从测试一种称为Mask R-CNN的最先进的对象检测算法来检测SKA PULSAR搜索管道中的候选标志。我们已经训练了蒙版R-CNN模型来检测候选图像。开发了一种自定义注释工具,以有效地标记大型数据集中感兴趣的区域。我们通过检测模拟数据集中的候选签名成功证明了该算法。本文介绍了这项工作的详细信息,并重点介绍了未来的前景。
translated by 谷歌翻译
本文提出了一种新型的非侵入系统故障预测技术,使用来自开发人员的可用信息,以及来自原始日志中的最小信息(而不是挖掘整个日志),但与数据所有者完全保持数据。基于神经网络的多级分类器是为故障预测而开发的,使用人为生成的匿名数据集,应用技术组合,即遗传算法(步骤),模式重复等,以训练和测试网络。提出的机制完全将用于培训过程的数据集与保留私有数据的数据集分解。此外,多标准决策(MCDM)方案用于优先考虑满足业务需求的失败。结果显示在不同参数配置下的故障预测准确性。在更广泛的上下文上,可以使用提出的机制具有人工生成的数据集执行任何分类问题,而无需查看实际数据,只要输入功能可以转换为二进制值(例如,来自私有二进制分类器的输出)并可以提供分类 - 服务。
translated by 谷歌翻译
移动服务机器人变得越来越无处不在。但是,这些机器人可能对视觉障碍者(PVI)提出潜在的可访问性问题和安全问题。我们试图探索PVI在主流移动服务机器人方面面临的挑战,并确定其需求。对他们在三个新兴机器人的经历进行了采访,接受了17个PVI:真空机器人,送货机器人和无人机。我们通过考虑其围绕机器人的不同角色(直接用户和旁观者)来全面研究PVI的机器人体验。我们的研究强调了参与者对移动服务机器人访问性,安全性和隐私问题的挑战和担忧。我们发现缺乏可访问的反馈使PVI难以精确控制,定位和跟踪机器人的状态。此外,遇到移动机器人时,旁观者感到困惑,甚至吓到参与者,并呈现安全性和隐私障碍。我们进一步提炼设计注意事项,以提供PVI的更容易访问和安全的机器人。
translated by 谷歌翻译
运动同步反映了相互作用二元组之间身体运动的协调。强大的深度学习模型(例如变压器网络)对运动同步的估计已自动化。但是,与其设计用于运动同步估计的专业网络,不如先前基于变压器的作品从其他任务(例如人类活动识别)中广泛采用了体系结构。因此,本文提出了一种基于骨架的图形变压器来进行运动同步估计。提出的模型应用了ST-GCN,这是一种空间图卷积神经网络,用于骨骼特征提取,然后是用于空间特征生成的空间变压器。空间变压器的指导是由相同的个体相同关节之间共享的独特设计的关节位置嵌入。此外,考虑到身体运动的周期性固有性,我们将时间相似性矩阵纳入了时间注意计算中。此外,与每个关节相关的置信度得分反映了姿势的不确定性,而先前关于运动同步估计的作品尚未充分强调这一点。由于变形金刚网络要求大量的数据进行训练,因此我们使用人类36M,一个用于人类活动识别的基准数据集构建了一个用于运动同步估算的数据集,并使用对比度学习鉴定了我们的模型。我们进一步应用知识蒸馏以减轻姿势探测器失败以隐私的方式引入的信息损失。我们将我们的方法与PT13上的代表性方法进行了比较,PT13是从自闭症治疗干预措施中收集的数据集。我们的方法达到了88.98%的总体准确性,并在保持数据隐私的同时超过了同行。
translated by 谷歌翻译